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ABSTRACT 

Massive Open Online Courses (MOOCs) which enable large- 
scale open online learning for massive users have been play- 
ing an important role in modern education for both students 
as well as professionals. To keep users’ interest in MOOCs, 
recommender systems have been studied and deployed to 
recommend courses or videos that a user might be inter- 
ested in. However, recommending courses and videos which 
usually cover a wide range of knowledge concepts does not 
consider user interests or learning needs regarding some spe- 
cific concepts. This paper focuses on the task of recom- 
mending knowledge concepts of interest to users, which is 
challenging due to the sparsity of user-concept interactions 
given a large number of concepts. In this paper, we propose 
an approach by modeling information on MOOC platforms 
(e.g., teacher, video, course, and school) as a Heterogeneous 
Information Network (HIN) to learn user and concept rep- 
resentations using Graph Convolutional Networks based on 
user-user and concept-concept relationships via meta-paths 
in the HIN. We incorporate those learned user and concept 
representations into an extended matrix factorization frame- 
work to predict the preference of concepts for each user. Our 
experiments on a real-world MOOC dataset show that the 
proposed approach outperforms several baselines and state- 
of-the-art methods for predicting and recommending con- 
cepts of interest to users. 
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1. INTRODUCTION 


MOOCs (Massive Open Online Courses), which are free on- 
line courses available to anyone to enroll around the world, 
have gained a lot of popularity in the past decade. By 
the end of 2018, popular MOOC platforms such as edX?, 


‘https: //www.edx.org/ 


Guangyuan Piao “Recommending Knowledge Concepts on MOOC Plat- 
forms with Meta-path-based Representation Learning”. 2021. In: Pro- 
ceedings of The 14th International Conference on Educational Data Min- 
ing (EDM21). International Educational Data Mining Society, 487-494. 
https://educationaldatamining.org/edm2021/ 

EDM ’21 June 29 - July 02 2021, Paris, France 


and Coursera? have provided 11,400 courses with 101 mil- 
lion users/learners on those platforms®. Previous studies 
have shown that MOOCs do have a real impact [24, 8]. For 
example, Chen et al. [8] showed that 72% of survey respon- 
dents reported career benefits and 61% reported educational 
benefits. Despite of the popularity, one main challenge of 
MOOCs is the overall completion rate of those courses is 
normally lower than 10% [19, 30]. Therefore, understanding 
and predicting user behaviors and learning needs are impor- 
tant to keep users learning on MOOC platforms. 


To this end, previous studies have focused on understanding 
dropout or procrastination behavior [28, 12, 38, 14] and rec- 
ommending content such as courses and learning paths that 
a user might be interested in [16, 26, 4). A MOOC can be 
seen as a sequence of videos where each video is associated 
with some knowledge concepts. For example, a video in a 
computer science MOOC can cover several concepts such as 
“software” and “hardware”. More recently, Gong et al. [13] 
argued that course or video recommendations overlook user 
interests regarding specific knowledge concepts. For exam- 
ple, data mining courses taught by different teachers can be 
quite different in a microscopic view, and a user who is in- 
terested in some specific concepts such as “association rules” 
might be interested in various video clips or learning materi- 
als from different teachers covering those concepts from dif- 
ferent perspectives. Therefore, understanding a user’s learn- 
ing needs from a microscopic view and predicting knowledge 
concepts that the user might be interested in are important. 


In this work, we focus on predicting and recommending 
knowledge concepts that might be interesting to users on 
MOOC platforms. Based on the interaction history be- 
tween users and concepts (i.e., a user has interacted with 
a concept if the user has learned that concept), traditional 
recommendation approaches such as collaborative filtering 
(CF) — which recommends similar items (concepts) based 
on a user’s interaction history or interesting items from sim- 
ilar users — can be applied. However, the sparsity of user- 
item (user—concept) relationships can limit the performance 
of CF-based methods. In addition to users and concepts, 
MOOC platform data normally contain other entities such 
as courses, videos, and teachers as well as the relationships 
among those entities. 


To cope with the sparsity problem, we model those enti- 
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Figure 1: Different types of entities and relationships be- 
tween two entities in MOOCs. A user w is interested in a 
concept if wu has learned or is going to learn it in the future. 


ties and their relationships as a heterogeneous information 
network (HIN) [33] consisting of the entities and relation- 
ships inspired by [13], which can be used for learning user 
(concept) representations/embeddings by exploring indirect 
user-user (concept-concept) relationships with Graph Con- 
volutional Networks (GCNs). Figure 1 illustrates such a HIN 
which we discuss in detail in Section 3. For example, one 
can derive a homogeneous user graph based on an indirect 
path in the HIN, e.g., a graph with users and edges between 
two users if they have taken the same course. Given such 
a homogeneous graph, traditional GCNs can be applied to 
the graph to learn the representations/embeddings of users 
and concepts with respect to the chosen path. 


Based on different indirect paths chosen, we can derive var- 
ious user (concept) representations, and those representa- 
tions of users (concepts) regarding different paths can be 
aggregated, e.g., using the mean of those representations. 
Instead of the straightforward mean aggregation, we propose 
and investigate different attention mechanisms to derive ag- 
gregated user (concept) representations based on different 
paths. The intuition behind using an attention mechanism 
is that different paths might have different importance for 
each user. Afterwards, those learned user and concept rep- 
resentations can be used for predicting the preference scores 
of concepts for recommendations. Our contributions in this 
work are as follows: (1) We propose an end-to-end frame- 
work" for predicting and recommending knowledge concepts 
of a user’s interest in Section 4; (2) We investigate two at- 
tention mechanisms for aggregating information from differ- 
ent meta-paths (the definition can be found in Section 3) 
to derive user and concept representations. We then incor- 
porate those representations into our extended matrix fac- 
torization framework for predicting the preference score of 
a concept with respect to a user; (3) Finally, we evaluate 
our approach with several baselines and state-of-the-art ap- 
proaches in terms of well-established evaluation metrics, and 
show the effectiveness of our proposed approach in Section 6. 


2. RELATED WORK 


Recommender Systems and User Modeling on MOOC 
Platforms. There has been growing interest in recommender 
systems on MOOC platforms since 2013 with respect to dif- 
ferent aspects such as course, video, and learning paths [16, 
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26, 3, 43, 9, 23]. For instance, the authors in [3] proposed 
YouEDU, which is a pipeline for classifying MOOC forum 
posts and recommending instructional video clips that might 
be helpful for resolving confusion detected in those posts. 
In [21], the authors showed that peer recommendations can 
improve users’ engagement significantly in the context of a 
Project Management MOOC. Dai et al. [9] proposed analyz- 
ing course content for recommending personalized learning 
paths on MOOC platforms. Khalid et al. [18] provides a 
comprehensive survey on recent advances regarding differ- 
ent recommender systems in the context of MOOCs. More 
recently, researchers have started modeling user interests in 
the context of MOOCs while user modeling has been widely 
studied in other domains such as social media [42]. For ex- 
ample, Li et al. [22] investigated the impact of acquiring 
user interests via surveys or questionnaires on course rec- 
ommendations. In [2], the authors proposed LeCoRe which 
exploits user interest modeling for recommending courses as 
well as similar users for promoting peer learning in enterprise 
environment. Gong et al. [13] argued that course recommen- 
dations overlook user interests regarding specific knowledge 
concepts, and studying users’ online learning interests from 
a microscopic view and recommending knowledge concepts 
can capture user interests better and provide the flexibility 
of choosing learning resources of their interest. In this work, 
we also focus on the microscopic view for knowledge concept 
recommendations. 


Recommendation Approaches with HIN. The basic idea 
of early recommendation approaches with HIN is to leverage 
path-based semantic relatedness between users and items 
over HINs, e.g., leveraging meta-path-based similarities for 
recommendation [40, 32, 41]. For example, Shi et al. [32] 
proposed predicting item ratings based on those from similar 
users measured via different meta-paths. With the advances 
of graph representation learning, the authors in [31] pro- 
posed using pre-trained user and item embeddings based on 
meta-path information with random walk, and incorporated 
those pre-trained embeddings as features into an extended 
matrix factorization framework. The most similar work to 
ours is Gong et al. [13], which is one of the first works 
for recommending knowledge concepts on MOOC platforms 
in a heterogeneous view. The authors showed that their 
proposed approach outperforms other CF-based baselines as 
well as metapath2vec [11], which uses learned node represen- 
tations of a given HIN for knowledge concept recommenda- 
tions by measuring the similarities between two nodes. Our 
work differs from [13] in several aspects. First, we formu- 
late interacted concepts for each user as implicit feedback 
while [13] treated the number of clicks as ratings and for- 
mulated the problem as rating prediction for recommending 
top-k unknown concepts with higher ratings. Secondly, we 
investigate different attention mechanisms including the one 
incorporating the latent features of users (items) from ma- 
trix factorization. Thirdly, the prediction layer (Eq. 6) for 
estimating the preference score of a concept is different from 
[13] which uses the user (item) representations as features 
for the final prediction. 


3. PRELIMINARIES 


In this work, we consider the task of predicting and recom- 
mending concepts that a user might be interested in based 
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on their learning history, which includes a set of learned 
concepts and their contextual information such as courses, 
videos, etc. With n users U = {u1,--- ,un}, and m con- 
cepts C = {c1,--- ,Cm}, we define an implicit feedback ma- 
trix R € R”*™ with each entry ru, = 1 if u has learned 
c and ry,- = O otherwise. The task can be framed in the 
context of HIN which is denoted as G = {V, €} consisting 
of a object set V and a link set €. A HIN is also associated 
with an object type mapping function ¢: V > O and a link 
type mapping function w:€ > R. O and R denote the sets 
of predefined object and link types, where |O|+|R| > 2 [33]. 
The MOOC data in our study can be represented as a HIN. 
The HIN consists of six types of entities such as user, con- 
cept, video, course, school, and teacher. In addition, there 
is a set of links describing the relationships among those 
entities. On top of the definition of HIN, the concept of 
network schema is used to describe the meta structure of a 
network [31]. 


The network schema [35] is denoted as S = (O, R). It is 
a meta template for an information network G = {V, E} 
with the object type mapping ¢: V > O and the link type 
mapping w : € > R, which is a directed graph defined over 
object types O, with edges as links from R. Fig. 1 shows the 
network schema of our MOOC dataset with the six different 
entity types and the semantic links between them. Given 
the network schema, we can extract semantic meta-paths 
between a pair of entities. A meta-path can be formally 
defined as follows: 


A meta-path [34] MP is defined on a network schema S = 
(O, R) and is denoted as a path in the form of O; mia 


R R : : : : 
Oz —> --- —4 O141, which describes a composite relation 
R= R,0 R20---o R; between object O; and Oj+1, where o 
denotes the composition operator on relations. 


4. PROPOSED APPROACH 

In this section, we introduce our proposed approach MOOCIR 
(MOOC Interest Recommender) based on meta-paths in the 
MOOC HIN. In high level, our approach extends the matrix 
factorization (MF) gu. = x2 ze, where YJu,c denotes the pre- 
dicted preference score of concept c with respect to user u, 
and x, and z-. refer to latent features of u and c, respec- 
tively. We extend the MF with user (concept) represen- 
tations/embeddings that are learned by applying GCNs to 
meta-path-based graphs. Fig. 2 shows an overview of our 
approach, which consists of four main components. In the 
following, we describe each component in detail. 


Table 1: Meta-paths selected for extracting user-user and 
concept-concept relationships. 


Type Meta-path 


=I 
user — concept —> user 
-1 
User user — course —> user 
: -1 
user — video —> user 


—1 —1 
user — course — teacher —> course —> user 


=I 
concept — user ——> concept 
Concept P P 


-1 
concept — course ——> concept 


ee 


—" . aah - 
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Figure 2: Overview of our proposed approach MOOCIR. 


Meta-path selection. As discussed in Section 3, meta-paths 
provide the capability to derive entity-entity relationships 
through those paths. Similar to previous studies [13, 31], 
we consider user-user and concept-concept relationships via 
different meta-paths. To fairly compare with [31] in our ex- 
periments, we use the same set of meta-paths used in [31] 
for our study. Table 1 summarizes six meta-paths used for 
our work where four paths for users and two for concepts. 
For each meta-path, a homogeneous graph with respect to 
users (concepts) can be extracted, which is depicted as its 
corresponding adjacency matrix in Fig. 2. As one might 
expect, each entry in the adjacency matrix A regarding a 
meta-path is equal to one if two users (concepts) can be 
connected via that meta-path, and zero otherwise. After- 
wards, we can learn user (concepts) representations for each 
meta-path using GCNs. 


Graph Convolution Networks (GCNs). GCNs learn node 
representations of a graph by inspecting neighboring nodes. 
In this work, we adopt the following layer-wise propaga- 
tion rule to learn user (concept) representations /embeddings 
with respect to a meta-path. 


ht) = g(Ph'w’) (1) 


where g(-) is an activation function which we use ReLu [25] 
here. P = D~'A where D is the diagonal node degree 
matrix of A to normalize the matrix A, and A =A+l1is 
an adjacency matrix with self-loops in a graph based on a 
specific meta-path. W_' refers to a trainable weight matrix at 
layer | for all nodes. h° can be fed with features of each node 
if there is a set of features for each node or can be initialized 
and learned afterwards as well. The output representation of 
the last layer can be used as user (concept) representations. 
For example, when | = 2, the representation of a user u 
for a meta-path MP, will be eM Pi = ho wp, where ho wp, 
is the output of the last layer of GCNs for the meta-path 
MP, with respect to u. In our study, we use a single layer 
GCN where h° is initialized randomly and learned during 
the training process, but one can easily extend it with more 
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layers or using existing features for h°. 


Attention. Attention mechanism [36] is motivated by how 
we pay visual attention to different regions of an image or 
relevant words in one sentence, and has been used widely to 
advance various fields such as natural language processing 
and recommender systems [37, 13]. In our context, different 
meta-paths can have different importance with respect to 
each user, and incorporating the importance of each meta- 
path differently for each user can be beneficial when aggre- 
gating user representations from different meta-paths, i.e., 


MP 
fev |e, |?!) > e,. In our work, we apply the atten- 
tion mechanism from [6] for our context as follows: 

gMPi — exp(Vio(Wuen"*)) 


u (2) 
De jelMP| exp(VEo(Wuew )) 


where the output a2’ indicates the weight (or importance) 


for ei and V2 and W, are trainable matrices for users. 
The attention mechanism can be formulated in the same 
manner for concepts. Next, the user representations coming 
from different meta-paths can be aggregated as follows: 


ee ee a (3) 


jE|MP| 


The above-mentioned attention mechanism takes into ac- 
count different meta-paths but does not consider any con- 
text in the extended MF, which can be the latent features 
of users and concepts for MF. Therefore, we also investi- 
gate the following attention mechanism which considers the 
latent features of user x,, which has not been explored in 
previous studies. In this case, a meta-path based embedding 
eM? and x, are concatenated together when calculating the 
attention scores as follows. 


a i; exp(Vio(Wy [er 3 f (Xu)])) (4) 
Diejarp| C2P(VEo(Wulen ’; f(xu)])) 


where f(xXu) applies non-linearity with a single layer feed- 
forward neural networks to xu instead of using it directly, 
which is inspired by [31] where the authors showed that non- 
linear fusion is required when combining latent features from 
matrix factorization and entity embeddings from GCNs. Af- 
terwards, the final user representation can be obtained in the 
same manner as Eq. 3. 


ae ie ee (5) 


jE|MP| 


The attention mechanism can be formulated in the same 
manner for concepts. 


Prediction. Given those learned user and concept repre- 
sentations/embeddings e, and e<. The preference score of 
a concept c for a user u can be calculated as follows by ex- 
tending the matrix factorization framework: 


Guje = XyZe +> ey Mec + be (6) 


where Yc is the preference score, x, and Ze are the latent 
features for the matrix factorization, and b- is a bias term. 
In addition, M is a trainable matrix to let e, in the same 


space with e., and y is a trainable parameter for the trade- 
off between the prediction scores from matrix factorization 
and the user and concept embeddings. 


4.1 Training Details 


Loss function. We use the Bayesian Personalized Ranking 
(BPR) [29] which has been widely used for recommender sys- 
tems with implicit feedback [7, 5, 27]. The intuition behind 
BPR is that a learned concept for a user should be ranked 
higher (with a higher score) compared to a random one in 
the list of concepts with which the user has not interacted, 
which can be formulated as follows: 


L= SP =In(o(Guiz)) + A100? (7) 


(u,t,j)EDs 


where (u,i,7) refers to a triplet including a user u, an in- 
teracted concept 7 and an unknown concept j for the user. 
Yuij = Yui — Yuj Measures the preference difference between 
the interacted concept and the unknown one, o denotes the 
sigmoid function: s(x) = ee d is the regularization pa- 
rameter for the £2 norm, and © denotes the set of parame- 
ters to be learned. The training set D, can be constructed 
by paring an unknown concept randomly with an interacted 


concept in the training set of a user. 


To learn the parameters of our proposed approach for min- 
imizing the loss in Eq. 7, we use a mini-batch gradient de- 
cent with 1,024 as the batch size, and use the Adam update 
rule [20] to train the model using the training set. In ad- 
dition, the learning rate is set as 0.01, the regularization 
parameter X is set as le — 8, and the dimension of latent 
features for MF and that of user (concept) embeddings are 
set as 30 and 100 respectively as in [13]. 


To overcome the overfitting problem, we further construct a 
validation set by using the last interacted concept for each 
user, and randomly pair each known concept with 99 un- 
known concepts. We run 500 epochs where the convergence 
is observed, and monitor the performance of evaluation met- 
rics (see Section 5) on the validation set. At the end, we 
choose the best-performing model on the validation set in 
terms of MRR (Mean Reciprocal Rank), which is one of 
the evaluation metrics measuring how well a ground truth 
concept is ranked in the corresponding set of 100 concepts. 
Any other evaluation metric can be used for choosing the 
best-performing model as well based on the preference for a 
specific metric. 


5. EXPERIMENTAL SETUP 


MOOC Dataset. We use the MOOCCube dataset [39] from 
the XuetangX platform for our experiments. The MOOC- 
Cube dataset is one of the largest and comprehensive MOOC 
datasets, and provides rich information about MOOCs and 
user activities on the platform from 2017 to 2019 [39]. Each 
course or video has a set of covered knowledge concepts in 
the dataset. In this work, we use user activities from 2017- 
01-01 to 2019-10-31 for training and those from 2019-11-01 
to 2019-12-31 for testing. We limit users who have learned 
concepts in both training and testing periods and have at 
least one new concept (which did not appear in the training 
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Table 2: Statistics of the MOOCCube dataset for experi- 
ments. 


Entities Statistics Relations Statistics 
users 2,005 user-concept 930,553 
concepts 21,037 user-course 13,696 
courses 600 course-video 42,117 
videos 22,403 teacher-course 1,875 
schools 137 video-concept 295,475 
teachers 138 course-concept 150,811 


period) in the testing period. Overall, the dataset consists of 
2,005 users 21,037 concepts, 600 courses, 22,403 videos, 137 
schools, 138 teachers, and the relationships among those en- 
tities. In total there are 930,553 interactions between users 
and concepts with 858,072 interactions in the training set 
and the rest (72,481) in the test set. The overall statistics 
of the dataset are presented in Table 2. 


Evaluation Metrics. We evaluate the top—k predictions of 
concepts for users with the following widely used evaluation 
metrics where k is set to 5, 10, and 20. We calculate all 
metrics for each set of 100 concepts (with one interacted 
and 99 unknown) in the test set. For each interacted con- 
cept with respect to a user u, we generate the corresponding 
recommendation list Ry, = {ru,rz,...,rk} where ri, indi- 
cates concept ranked at the i-th position in R, based on 


the predicted scores of those concepts. 


Hit Ratio of top-k concepts (HR@k) measures the fraction 
of relevant concepts in the test set that are in the top—k con- 
cepts of the recommendations: HR@k = % D>, I(|RuNTul) 
where N is the total number of sets for testing, [(a) is an 
indicator function which equals one if « > O and equals 
zero otherwise. Normalized Discounted Cumulative Gain 
(nDCG@k) takes into account rank positions of the rele- 
vant concepts, and can be computed as follows:nDCG@k = 
1 raTul) — 
ZDCGGk = oe ? log +1) : 
score obtained by an ideal top-k ranking which serves as 
a normalization factor. Mean Reciprocal Rank (MRR) is 


the average of the reciprocal ranks of positive concepts: 
MRR= + ere 1 __ where rank; refers to the rank po- 


rank; 
sition of the one interacted concept in the corresponding set 
of 100 concepts with the rest of unknown ones. 


where Z denotes the 


We use the paired t-test for testing the significance where the 
significance level of a is set to 0.05 unless otherwise noted. 


5.1 Compared Methods 


To better understand and investigate the contribution of 
each component and the performance with the two atten- 
tion mechanisms introduced in Section 4, we first compare 
several variants of our approach. MOOCIR,1 denotes our ap- 
proach with the attention mechanism only considering differ- 
ent meta-paths using Eq. 3. MOOCIRa2 refers to our approach 
with the attention mechanism incorporating the latent fea- 
tures of users (concepts) using Eq. 4. MOOCIR,- is a variant 
of our approach without any attention, i.e., different meta- 
paths are treated equally and the representations learned 
from those paths are averaged. MOOCIR», refers to a variant 
without the matrix factorization part for prediction in Eq. 6, 


which only uses meta-path-based user and concept represen- 
tations for predicting the preference score of a concept. 


Next, we compare MOOCIR with the following baselines and 
state-of-the-art methods to evaluate the performance of rec- 
ommending knowledge concepts for users. TopPop is a straight- 
forward baseline method which ranks concepts based on 
their popularity. Here, the popularity of a concept can be 
measured based on the number of users that have learned 
the concept. MFBPR [29] is a matrix factorization approach 
which optimizes a pairwise ranking loss for the recommen- 
dation task as our approach but without meta-path-based 
representation learning. That is, the second component in 
Eq. 6 based on user (concept) representations is removed. 
FISM [17] is an item-to-item collaborative filtering approach 
which provides recommendations based on the average em- 
beddings of all interacted concepts and the embeddings of 
the target concept. NAIS [15] is also an item-to-item collabo- 
rative filtering approach, but with an attention mechanism, 
which is capable of distinguishing which historical items in a 
user profile are more important for a prediction. We use the 
author’s implementation for both NAIS and FISM°®. metap- 
ath2vec [11]. metapath2vec is a meta-path-based represen- 
tation learning model which leverages meta-path-based ran- 
dom walks to construct the heterogeneous neighborhood of 
a node and then leverages a heterogeneous skip-gram model 
to learn node embeddings. We use the StellarGraph [10] im- 
plementation of metapath2vec for our experiment in which 
the parameters of metapath2vec are set the same as in [11] 
except the number of random walks is set as 500 instead 
of 1000°. ACKRec [13] also models the MOOC dataset as a 
HIN and extracts user (concept) representations from the 
same set of meta-paths in Table 1. However, ACKRec treats 
the problem as rating prediction task where the rating of 
a concept for a user is the number of interactions between 
the user and the concept. Also, it exploits user and concept 
representations as features while extending the matrix fac- 
torization framework. We use the author’s implementation” 
for our experiments. MFBPR and those MOOCIR variants are 
implemented using Tensorflow [1]. All experiments are run 
on an Intel(R) Core(TM) i5-8365U processor laptop with 
16GB RAM, and MOOCIR variants take less than two days 
for training. 


6. RESULTS 


Table 3 summarizes the results using the variants of MOOCIR. 
As we can see from the table, MOOCIR,,- — which uses user 
and concept representations learned based on meta-paths 
with the HIN but without the matrix factorization compo- 
nent — provides worse performance compared to the other 
variants. The results indicate that extending the matrix 
factorization is necessary for MOOCIR. 


Next, we compare MOOCIR,- and the variants with atten- 
tion mechanisms (i.e., MOOCIR,; and MOOCIR,2). We observe 
that both MOOCIRa: and MOOCIRa2 outperform MOOCIR,- in 
terms of all evaluation metrics, which shows that using at- 


https: //github.com/AaronHeee/ 
Neural-Attentive-Item-Similarity-Model 

°We noticed that using 1000 random walks took more than 
10 days for training and did not improve the performance 
compared to using 500. 
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Table 3: Performance of several variants of our proposed 
approach in term of different evaluation metrics with the 
best-performing scores in bold. 


HR nDCG 

k=5 10 20 k=5 10 20 aie 

MOOCIR,,¢ 0.676 0.812 0.906 0.499 0.543 0.567 0.468 
MOOCIR,- 0.701 0.832 0.920 0.513 0.556 0.578 0.477 
MOOCIRai 0.704 0.836 0.922 0.520 0.562 0.584 0.484 
MOOCIR,2 0.703 0.838 0.922 0.517 0.561 0.583 0.482 


Attention weights for different meta-paths 


2 


Meta-paths for users 
3 


ie) 10 20 30 40 50 60 70 80 90 
Randomly selected 100 users 


Figure 3: Attention weights of different meta-paths for 100 
randomly chosen users learned using MOOCIR,:; where we ob- 
serve different weights of meta-paths for each user. In this 
heatmap, a darker cell indicates a higher attention weight. 


tention can indeed improve the performance and different 
meta-paths have different importance for deriving user (con- 
cept) representations. This can be verified further by inves- 
tigating learned attention weights for different meta-paths 
in MOOCIR as well. For example, Fig. 3 shows a heatmap 
regarding the learned attention weights for 100 randomly 
selected users using MOOCIR. In the figure, z-axis refers to 
the 100 users and y-axis indicates the attention weights for 
the four different meta-paths for users described in Table 1 
in Section 4. From the figure, we can notice that the first 


meta-path (i.e., wser — concept = user) overall has a 
higher weight compared to others. In addition, we observe 
that the attention weights vary across users, which indicates 
the importance of each meta-path varies for different users. 


Finally, by comparing the two different attention mecha- 
nisms, we observe that the one incorporating the latent fea- 
tures of users and concepts (Eq. 4) does not improve the 
performance compared to the simpler one (Eq. 3), which 
is different from our assumption. Instead, we observe that 
MOOCIRa2 performs significantly worse than MOOCIRa1 in terms 
of HR@10 and HR@2Z0 for the users who have interacted 
with a limited number of concepts. Table 4 shows the per- 
formance for three groups of users with less than 150, 350, 
and 550 concepts, respectively. As we can see form the 
figure, MOOCIR,; outperforms MOOCIR,2 significantly for the 
first group of 353 users. The results suggest that fusing in- 
formation from the latent features of users (concepts) into 
the attention mechanism is a non-trivial task, and other ap- 


Table 4: Results of HR@10 and HR@20 for MOOCIR~1 and 
MOOCIR,2 for three groups of users (G150, G350, G550) with 
less than 150, 350, 550 concepts in the training set. 


HR@10 HR@20 
G150 G350 G550 G150 G350 G550 
0.806 0.830 0.851 0.894 0.911 0.927 
0.801 0.829 0.852 0.886 0.908 0.925 


MOOCIRat 
MOOCIR,2 


Table 5: Performance of MOOCIR,1 and compared methods in 
term of different evaluation metrics with the best-performing 
scores in bold. 


HR nDCG 

k=5 10 20 k=5 10 20 ME 
TopPop 0.486 0.629 0.767 0.343 0.390 0.425 0.332 
MFBPR 0.668 0.811 0.907 0.481 0.527 0.552 0.448 
FISM 0.584 0.701 0.800 0.438 0.476 0.501 0.418 
NAIS 0.568 0.691 0.811 0.420 0.461 0.491 0.403 
metapath2vec 0.642 0.773 0.873 0.468 0.511 0.537 0.440 
ACKRec 0.659 0.764 0.842 0.503 0.538 0.557 0.475 
MOOCIRai 0.704 0.836 0.922 0.520 0.562 0.584 0.484 


proaches should be investigated in the future. 


Overall, MOOCIRa1 provides the best performance among all 
variants. In the following, we discuss the performance of 
MOOCIRai compared with other baselines and state-of-the-art 
methods. 


Table 5 shows the performance of MOOCIRa1 and compared 
methods. We first observe that all the other methods out- 
perform TopPop which is a baseline method recommending 
popular concepts. For example, MOOCIR,; and ACKRec im- 
proves MRR over TopPop 45.8% and 43.1%, respectively. 
Among all the compared methods in Table 5, MOOCIR,i pro- 
vides the best performance followed by ACKRec, MFBPR, and 
metapath2vec. ACKRec performs best in terms of nDCG 
and MRR, and MFBPR performs best in terms or HR among 
compared methods. In detail, a significant improvement of 
MOOCIR,; over ACKRec in MRR (+1.9%), nDCG@5 (+3.1%), 
nDCG@10 (+4.5%), +nDCG@20 (4.8%) can be noticed 
(a < 0.01). Compared to MFBPR, MOOCIR,: improves the 
AR scores 6.7%, 9.2%, and 9.4% when k =5, 10, 20, respec- 
tively (a < 0.01). The two item-item CF methods (FISM 
and NAIS) do not perform well compared to MFBPR and ACK- 
Rec. One possible explanation might be due to the sparsity 
of the dataset, which makes that deriving item-item similar- 
ities based on interacted users for each item is challenging 
and limits the performance. 


Those results indicate that the proposed approach MOOCIRa1 
can achieve competitive performance in terms of those evalu- 
ation metrics for top-k concept recommendations compared 
to the baselines and state-of-the-art methods. 


7. CONCLUSIONS AND FUTURE WORK 


In this paper, we presented MOOCIR for predicting and recom- 
mending concepts that might be of users’ interest on MOOC 
platforms. The comparison of MOOCIR variants in Section 6 
shows that extending the matrix factorization with user and 
concept representations learned from different meta-paths 
and using attention for deriving those representations play 
crucial roles in achieving better performance. In addition, 
the results compared to other baselines and state-of-the- 
art methods indicate that MOOCIR,1 can improve the per- 
formance of predicting and recommending concepts signifi- 
cantly. The comparison between the two introduced atten- 
tion mechanisms (Eq. 3 and 4) suggests that a more compre- 
hensive approach is required while fusing the latent features 
of users and concepts into the attention mechanism, which 
will be investigated in the near future. 
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